Identifying Latent Semantics in High-Dimensional Web Data
نویسندگان
چکیده
Search engines have become an indispensable tool for obtaining relevant information on the Web. The search engine often generates a large number of results, including several irrelevant items that obscure the comprehension of the generated results. Therefore, the search engines need to be enhanced to discover the latent semantics in high-dimensional web data. This paper purports to explain a novel framework, including its implementation and evaluation. To discover the latent semantics in high-dimensional web data, we proposed a framework named Latent Semantic Manifold (LSM). LSM is a mixture model based on the concepts of topology and probability. The framework can find the latent semantics in web data and represent them in homogeneous groups. The framework will be evaluated by experiments. The LSM framework outperformed compared to other frameworks. In addition, we deployed the framework to develop a tool. The tool was deployed for two years at two places library and one biomedical engineering laboratory of Taiwan. The tool assisted the researchers to do semantic searches of the PubMed database. LSM framework evaluation and deployment suggest that the framework could be used to enhance the functionalities of currently available search engines by discovering latent semantics in high-dimensional web data.
منابع مشابه
Hierarchical Fuzzy Clustering Semantics (HFCS) in Web Document for Discovering Latent Semantics
This paper discusses about the future of the World Wide Web development, called Semantic Web. Undoubtedly, Web service is one of the most important services on the Internet, which has had the greatest impact on the generalization of the Internet in human societies. Internet penetration has been an effective factor in growth of the volume of information on the Web. The massive growth of informat...
متن کاملIdentifying Semantic in High-Dimensional Web Data Using Latent Semantic Manifold
Latent Semantic Analysis involves natural language processing techniques for analyzing relationships between a set of documents and the terms they contain, by producing a set of concepts (related to the documents and terms) called semantic topics. These semantic topics assist search engine users by providing leads to the more relevant document. We develope a novel algorithm called Latent Semant...
متن کاملTripleRank: Ranking Semantic Web Data by Tensor Decomposition
The Semantic Web fosters novel applications targeting a more efficient and satisfying exploitation of the data available on the web, e.g. faceted browsing of linked open data. Large amounts and high diversity of knowledge in the Semantic Web pose the challenging question of appropriate relevance ranking for producing fine-grained and rich descriptions of the available data, e.g. to guide the us...
متن کاملAnalytical Semantics Visualization for Discovering Latent Signals in Large Text Collections
Considering the increasing pressure of competition and high dynamics of markets, the early identification and specific handling of novel developments and trends becomes more and more important for competitive companies. Today, those signals are encoded in large amounts of textual data like competitors’ web sites, news articles, scientific publications or blog entries which are freely available ...
متن کاملResearch on Image Semantic Information Mining Based On Latent Dirichlet Allocation Model
Focusing on the issues of lacking of semantic description on image identification and methods of mapping from low-level semantics to high-level semantics, this paper describes the experiments of identification of image semantic information by using LDA model, which can achieve the mapping from image visual feature to high-level semantics, and experiments on the data sets of Corel 5k and Corel 3...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013